Asymptotic Optimality for Decentralised Bandits

نویسندگان

چکیده

Abstract We consider a large number of agents collaborating on multi-armed bandit problem with arms. The goal is to minimise the regret each agent in communication-constrained setting. present decentralised algorithm which builds upon and improves Gossip-Insert-Eliminate method Chawla et al. (International conference artificial intelligence statistics, pp 3471–3481, 2020). provide theoretical analysis incurred shows that our asymptotically optimal. In fact, guarantee matches optimal rate achievable full communication Finally, we empirical results support conclusions.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DCOPs and bandits: exploration and exploitation in decentralised coordination

Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we...

متن کامل

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...

متن کامل

Asymptotic Optimality of Balanced Routing

Consider a system with K parallel single-servers, each with its own waiting room. Upon arrival, a job is to be routed to the queue of one of the servers. Finding routing policy that minimizes the total workload in the system is a known difficult problem in general. Even if the optimal policy is identified, the policy would require the full queue length information at the arrival of each job; fo...

متن کامل

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of N > 2 populations, specified by random variables X i k, i = 1, . . . ,N, and k = 1,2, . . .; where X i k denotes the outcome from population i the k th time it is sampled. It is assumed that for each fixed i, {X i k}k>1 is a sequence of i.i.d. normal random variables, with unknown mean μi and unknown variance σ2 i . The objec...

متن کامل

Active Search and Bandits on Graphs using Sigma-Optimality

Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Dynamic Games and Applications

سال: 2022

ISSN: ['2153-0793', '2153-0785']

DOI: https://doi.org/10.1007/s13235-022-00451-1